Text normalization and semantic indexing to enhance Instant Messaging and SMS spam filtering

نویسندگان

  • Tiago A. Almeida
  • Tiago P. Silva
  • Igor Santos
  • José María Gómez Hidalgo
چکیده

The rapid popularization of smartphones has contributed to the growth of online Instant Messaging and SMS usage as an alternative way of communication. The increasing number of users, along with the trust they inherently have in their devices, makes such messages a propitious environment for spammers. In fact, reports clearly indicate that volume of spam over Instant Messaging and SMS is dramatically increasing year by year. It represents a challenging problem for traditional filtering methods nowadays, since such messages are usually fairly short and normally rife with slangs, idioms, symbols and acronyms that make even tokenization a difficult task. In this scenario, this paper proposes and then evaluates a method to normalize and expand original short and messy text messages in order to acquire better attributes and enhance the classification performance. The proposed ∗Corresponding author Email addresses: [email protected] (Tiago A. Almeida), [email protected] (Tiago P. Silva), [email protected] (Igor Santos), [email protected] (José M. Gómez Hidalgo) Preprint submitted to Knowledge-Based Systems April 22, 2016 text processing approach is based on lexicographic and semantic dictionaries along with state-of-the-art techniques for semantic analysis and context detection. This technique is used to normalize terms and create new attributes in order to change and expand original text samples aiming to alleviate factors that can degrade the algorithms performance, such as redundancies and inconsistencies. We have evaluated our approach with a public, real and non-encoded dataset along with several established machine learning methods. Our experiments were diligently designed to ensure statistically sound results which indicate that the proposed text processing techniques can in fact enhance Instant Messaging and SMS spam filtering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Detecting Messaging Abuse in Short Text Messages using Linguistic and Behavioral patterns

The use of short text messages in social media and instant messaging has become a popular communication channel during the last years. This rising popularity has caused an increment in messaging threats such as spam, phishing or malware as well as other threats. The processing of these short text message threats could pose additional challenges such as the presence of lexical variants, SMS-like...

متن کامل

SMS Spam Detection using Machine Learning Approach

Over recent years, as the popularity of mobile phone devices has increased, Short Message Service (SMS) has grown into a multi-billion dollars industry. At the same time, reduction in the cost of messaging services has resulted in growth in unsolicited commercial advertisements (spams) being sent to mobile phones. In parts of Asia, up to 30% of text messages were spam in 2012. Lack of real data...

متن کامل

Filtering Network Spam Message using Approximated Logistic Regression

The development of telecom network and Internet provides effective ways for communication. As an important way in communication, Short Messaging Service (SMS) via both telecom network and Internet has played an increasing important role in daily life. However, it usually suffers from spam SMS that causes misunderstanding and cheat. The highly varying content, network environment make the identi...

متن کامل

Email Spam Filtering: A Systematic Review

Spam is information crafted to be delivered to a large number of recipients, in spite of their wishes. A spam filter is an automated tool to recognize spam so as to prevent its delivery. The purposes of spam and spam filters are diametrically opposed: spam is effective if it evades filters, while a filter is effective if it recognizes spam. The circular nature of these definitions, along with t...

متن کامل

SMS spam filtering: Methods and data

Mobile or SMS spam is a real and growing problem primarily due to the availability of very cheap bulk pre-pay SMS packages and the fact that SMS engenders higher response rates as it is a trusted and personal service. SMS spam filtering is a relatively new task which inherits many issues and solutions from email spam filtering. However it poses its own specific challenges. This paper motivates ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Knowl.-Based Syst.

دوره 108  شماره 

صفحات  -

تاریخ انتشار 2016